EMT Speed versus Python and C

EMT is using an interpreter for its top level language. This is not fast. EMT is only fast if the underlying vectorization is used.

For an example of this effect, let us write a loop which counts how many integers of the form

are divisible by 7.

>function test1 () ...
 res=0;
 for i=1 to 1000
   for j=1 to 1000
     if mod(i^2+j^2,7)==0 then res=res+1; endif;
   end
 end
 return res;
 endfunction

>tic; test1(), toc;

20164
Used 1.971 seconds

It is possible to speed this up using vectorization. This trick avoids the interpreted language and relies on the underlying C code to do the loops.

  - Generate a matrix M[i,j]=i^2+j^2.
  - Flatten the matrix to a vector.
  - Apply the "mod" function to the vector.
  - Sum the vector and count the ones.

The matrix generation is done by the matrix language of EMT.

>function test2 () ...
 M=(1:1000)^2+((1:1000)^2)';
 return sum(mod(flatten(M),7)==0)
 endfunction

>tic; test2(), toc;

20164
Used 0.029 seconds

Typically, we are now 50 times faster.

But this kind of tricks make EMT and Matlab hard to read languages. It might be better to use a low level language for the work.

Before we use C, we make the user directory current, so that the compiled code of C can be stored.

>cd(eulerhome())

C:\Users\reneg\Euler

Then we can write pure C code to count. The last line "new_real(res)" is necessary to put the result on the EMT stack.

>function tinyc test3 () ...
 int res=0;
 for (int i=1; i<=1000; i++)
    for (int j=1; j<=1000; j++)
      if ((i*i+j*j) % 7 == 0) res++;
 new_real(res);
 endfunction

Typically, this is 5 times faster than the vectorized code above, and thus 250 times faster than the loop in the basic language.

>tic; test3(), toc;

20164
Used 0.007 seconds

Another Example - Prime Numbers

We now look at an example where vectorization is not that easy to find.

For this, we count the number of prime numbers up to 100'000. EMT already has a function "isprime". It works for vectors. Thus we can use the following.

>tic; sum(isprime(1:100000)), toc;

9592
Used 0.75 seconds

The function "isprime" is partly vectorized as you can see in the line after "return". It works in the following way.

  - Check n dividing by all numbers 2,3,5,... to sqrt(n).
  - Determine if any check was positive.

Of course, it would be nicer to check only up to a positive index.

>type isprime

function map isprime (n: integer)
    if n < 2 then
        return 0;
    elseif n==2 or n == 3 or n == 5 or n== 7then
        return 1;
    else
        if mod(n,2)==0 then return 0; endif;
        return !any(mod(n,3:2:sqrt(n))==0);
    endif
endfunction

If we try to implement this without vectorization we end with a code similar to the following.

>function map emtisprime (n) ...
 if n<2 then return 0; endif;
 if n==2 then return 1; endif;
 if mod(n,2)==0 then return 0; endif;
 k=3;
 repeat
    while k*k<=n;
    if mod(n,k)==0 then return 0; endif;
    k=k+2;
 end
 return 1;
 endfunction

Let us first see if it works.

>n=1:100; n[nonzeros(emtisprime(n))]

[2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,
53,  59,  61,  67,  71,  73,  79,  83,  89,  97]

This code shows how inefficient the programming language is when it comes to loops. We lose the factor of 5 typically compared to the partly vectorized code.

>tic; sum(emtisprime(1:100000)), toc;

9592
Used 4.171 seconds

As a conclusion, it is much better to use the matrix language for vectorization. This is true even if we loop more than necessary.

Let us try a bit of C code. This is much harder to understand since we need to pass the argument from EMT to C using the C macro "ARG_DOUBLE". We also use the "CHECK" macro.

>cd(eulerhome())

C:\Users\reneg\Euler

>function tinyc cisprimesing (n) ...
 ARG_DOUBLE(n);
 int m=(int)n;
 CHECK(n>0 && n==m,"Need positive integer.");
 int res=0;
 if (m==2) res=1;
 else if (m%2==0) res=0;
 else if (m>2)
 {
    int k=3;
    res=1;
    while (k*k<=m)
    {
        if (m%k==0) { res=0; break; }
        k+=2;
    }
 }
 new_real(res);
 endfunction

Now, this function does not vectorize and works only for a single number. So we use another function which maps to the arguments.

>function map cisprime (v : index) := return cisprimesing(v)
>n=1:100; n[nonzeros(cisprime(n))]

[2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,
53,  59,  61,  67,  71,  73,  79,  83,  89,  97]

>tic; sum(cisprime(1:100000)), toc;

9592
Used 0.194 seconds

Typically, we gain a factor of 20 to our self-made code, and a factor of 4 to the vectorized code in the EMT function isprime().

Let us see how much the vectorization of "map" in EMT costs. For that, we vectorize in C.

>function tinyc cisprimemaps (v) ...
 ARG_DOUBLE_MATRIX(x,r,c);
 RES_DOUBLE_MATRIX(y,r,c);
 int i;
 for (i=0; i<r*c; i++)
 {
    int res=0;
    int m=(int)(*x);
    if (m==2) res=1;
    else if (m%2==0) res=0;
    else if (m>2)
    {
        int k=3;
        res=1;
        while (k*k<=m)
        {
            if (m%k==0) { res=0; break; }
            k+=2;
        }
    }
    *y++=res; x++;
 }
 endfunction

We now gain another factor of 500 to even the best non-C code in EMT.

>tic; sum(cisprimemaps(1:100000)), toc;

9592
Used 0.009 seconds

Using Python

Since C is hard to read and understand, EMT can use Python. The recent version can work with Python 3.8. If you have an older version for Python 2 the following timings my be different.

The code is now a direct translation of the simple loop that we used in the EMT language above.

>function python pyisprimesing (n) ...
 if n<2:
    return 0
 if n==2:
    return 1
 if n%2 == 0:
    return 0
 k=3;
 while k*k <= n:
    if n%k == 0:
        return 0
    k=k+2
 return 1
 endfunction

It does not vectorize. So we do that via "map".

>function map pyisprime (v : index) := return pyisprimesing(v)
>tic; sum(pyisprime(1:100000)), toc;

9592
Used 0.412 seconds

That is about 10 times faster than the EMT syntax.

Note that we can now use the function "pyisprime" in Python too.

>>> print(pyisprimesing(11))

Thus we can easily vectorize the function in Python itself.

>function python pyisprimemaps (v : index) ...
 w=list(v)
 for i in range(len(v)):
    w[i]=pyisprimesing(v[i])
 return w
 endfunction

This gains another factor of about 2.

>tic; sum(pyisprimemaps(1:100000)), toc;

9592
Used 0.105 seconds

But we still lose a factor of 10 compared to a vectorized version in C. Python is typically a lot less efficient that comparable C code.

Vectorization - The Sieve

The sieve of Eratosthenes is another example of a partial vectorization in EMT.

As you see in the code below, the EMT code for the sieve generates the multiples of prime numbers in each step and cancels them from the sieve using vectorized code. Note that the vector for the sieve contains only odd numbers which makes the understanding of the code a bit more complicated but makes it more efficient.

>type primes

function primes (n: integer)
    if n>=3
        len = floor((n-1)/2);
        sieve = ones ([1,len]);
        for i=1 to (sqrt(n)-1)/2;
            if (sieve[i]) then
                sieve[3*i+1:2*i+1:len] = 0; ..
            endif
        end
        return [2, 1+2*nonzeros(sieve)];
    elseif n>=2
        return 2;
    else
        return [];
    endif
endfunction

>primes(100)

[2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,
53,  59,  61,  67,  71,  73,  79,  83,  89,  97]

The sieve is nevertheless enormously fast.

>tic; length(primes(1000000)), toc;

78498
Used 0.015 seconds

To beat this you will need a really good C code.

The following is a simple translation of the sieve algorithm in C. For the work area, we get a part of the EMT stack. We store each number as a character which is 1=prime and 0=non-prime to save space.

>cd(eulerhome())

C:\Users\reneg\Euler

>function tinyc cprimes (n) ...
 ARG_DOUBLE(n);
 int m=(int)n;
 CHECK(n>1 && m==n,"Need an integer for cprimes");
 char *v=getram(m);
 CHECK(v,"Could not allocate memory");
 for (int i=2; i<m; i++) v[i]=1;
 for (int i=2; i*i<=m; i++)
 {
    if (v[i])
        for (int j=2*i; j<m; j+=i) v[j]=0;
 }
 int count=0;
 for (int i=2; i<m; i++) count+=v[i]; 
 header *hd=new_matrix(1,count);
 double *x=matrixof(hd);
 for (int i=2; i<m; i++)
    if (v[i]) *x++=i;
 moveresults(hd);
 endfunction

>cprimes(100)

[2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,
53,  59,  61,  67,  71,  73,  79,  83,  89,  97]

The result is not much faster than the EMT code.

>tic; length(cprimes(1000000)), toc;

78498
Used 0.014 seconds

Euler Home